Show package setup
# Packages and settings
library(CASdatasets)
library(fairmodels)
library(tidyverse)
library(ggpubr)
library(scales)
library(knitr)
library(kableExtra)
library(xgboost)
set.seed(14)
lambda_all <- 1A practical framework for fair algorithmic pricing
Insurance pricing is fundamentally based on risk classification, where policyholders are grouped according to their expected risk levels and charged premiums that reflect their expected losses. Traditionally, insurers rely on statistical models and risk factors such as age, driving history, vehicle characteristics, and location to estimate premiums. Although risk classification improves predictive accuracy and reduces adverse selection, concerns have increasingly emerged regarding unfair discrimination against protected groups.
In many jurisdictions, insurers are prohibited from directly using certain protected characteristics, such as race, ethnicity, or gender, in pricing decisions. However, even when protected attributes are excluded from the model, indirect discrimination may still arise through the use of proxy variables or complex predictive algorithms. The growing use of machine learning and large-scale data analytics in insurance has therefore raised important questions about fairness, transparency, and regulation in insurance pricing.
This case study is based on Xin and Huang (2024) and implements Step 2 at the cost-modelling stage only. It estimates fair expected claim costs and ignores demand, expense and profit loadings, and price optimisation. Those components are limitations of this illustration, not of Step 2 itself. The case study compares five model designs (M0–MC) linked to fairness criteria from Step 2:
The analysis is designed for policy and model evaluation. It shows how fairness interventions affect predicted pure premiums, disparate impact, and predictive accuracy, not a filing-ready rate plan for any market.
Not every section is needed for every reader. The table below suggests a practical reading order.
| If your focus is… | Start here | Then read |
|---|---|---|
| Which fairness criterion maps to which model | Pricing models | Fairness–accuracy trade-off |
| What the French data results show | Model summaries | Premium differences → Double lift charts |
| Implementation detail | Data and scenario design → Pricing models | Folded code chunks (setup through pure premium prediction) |
This case study applies one illustrative antidiscrimination design from Xin and Huang (2024) to French motor data. In your market you must decide:
Pure premium vs commercial rate. Following the paper, \hat{Y} is expected claim cost (frequency × severity), not a filed commercial premium. Xin and Huang (2024) treat this as approximately the price charged when expense and profit loadings are ignored. In practice, insurers also add acquisition and administrative expense, cost of capital, reinsurance, profit margin, and taxes. Fairness at the pure-premium stage does not guarantee fairness in final rates if those loadings differ across groups. Once cost models are set, Case Study 2 illustrates how demand and pricing rules can further change premiums and welfare.
Portfolio-level adjustments align total predicted premium across some models with the unawareness benchmark so comparisons focus on redistribution, not portfolio level. Adapt this step to your governance needs.
Some implementation details are folded to improve readability, while the main modelling workflow and outputs remain available for inspection.
# Packages and settings
library(CASdatasets)
library(fairmodels)
library(tidyverse)
library(ggpubr)
library(scales)
library(knitr)
library(kableExtra)
library(xgboost)
set.seed(14)
lambda_all <- 1# Scenario setup
# In this scenario, Insurance score is treated as the
# non-legitimate variable.
non_legitimate <- c("Insurancescore")
glm_predictors <- c(
"Age.ct",
"Bonus",
"Value",
"Density",
"GroupOne",
"Insurancescore"
)
xgb_predictors <- glm_predictors# Helper functions
make_age_group <- function(age) {
cut(
age,
breaks = c(-Inf, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, Inf),
labels = c("18-22", "23-27", "28-32", "33-37", "38-42", "43-47",
"48-52", "53-57", "58-62", "63-67", "68+"),
right = TRUE
)
}
# Convert a jittered continuous variable back to integer-like factor levels.
# This is used after disparate impact removal for variables that were originally categorical.
bin_to_factor <- function(x, min_level, max_level) {
levels <- min_level:max_level
x <- pmin(pmax(x, min_level), max_level)
cuts <- c(min_level - 0.5, levels[-1] - 0.5, max_level + 0.5)
as.factor(as.numeric(cut(x, breaks = cuts, labels = levels, include.lowest = TRUE)))
}
make_artificial_proxies <- function(data) {
# W variables are more likely to be 1 for females; M variables are more likely to be 1 for males.
for (v in paste0("W", 1:5)) {
u <- runif(nrow(data))
data[[v]] <- ifelse(data$Gender == "Female", u < 0.6, u < 0.4) * 1
}
for (v in paste0("M", 1:5)) {
u <- runif(nrow(data))
data[[v]] <- ifelse(data$Gender == "Male", u < 0.6, u < 0.4) * 1
}
data
}
prepare_pricing_data <- function(data) {
data <- data %>%
rename(Age.ct = Age) %>%
mutate(
Age = make_age_group(Age.ct),
Group1 = as.factor(Group1),
Adind = as.factor(Adind),
Bonus = as.factor(Bonus),
Gender = relevel(Gender, ref = "Male"),
Age = relevel(Age, ref = "18-22"),
Type = relevel(Type, ref = "A"),
Occupation = relevel(Occupation, ref = "Employed"),
Category = relevel(Category, ref = "Medium"),
Group1 = relevel(Group1, ref = "1"),
Group2 = relevel(Group2, ref = "L"),
Female = as.integer(Gender == "Female"),
ClaimSeverity = Indtppd / Numtppd
) %>%
make_artificial_proxies()
# Numeric labels are used only to simplify model matrix names and teaching plots.
data <- data %>%
mutate(
Age = as.factor(as.numeric(Age)),
Type = as.factor(as.numeric(Type)),
Occupation = as.factor(as.numeric(Occupation)),
Category = as.factor(as.numeric(Category)),
Group1 = as.factor(as.numeric(Group1)),
Group2 = as.factor(as.numeric(Group2)),
Bonus = as.factor(as.numeric(Bonus))
) %>%
rename(
W_one = W1, W_two = W2, W_three = W3, W_four = W4, W_five = W5,
M_one = M1, M_two = M2, M_three = M3, M_four = M4, M_five = M5,
GroupOne = Group1, GroupTwo = Group2
) %>%
dplyr::slice(-(1:21)) %>% # remove duplicate entries, as in the original code
select(-PolNum, -CalYear)
data
}
add_insurance_score <- function(data) {
# Insurance score is constructed as the fitted probability of having a non-zero claim.
score_model <- glm(
I(Indtppd != 0) ~ Type + Category + Occupation + GroupTwo + Age,
family = binomial(link = "logit"),
data = data
)
data$Insurancescore <- fitted(score_model)
list(data = data, model = score_model)
}
make_gender_proxy <- function(data) {
proxy_vars <- c("M_one", "M_two", "M_three", "M_four", "M_five",
"W_one", "W_two", "W_three", "W_four", "W_five")
glm(Female ~ ., family = binomial(link = "logit"), data = data[, c("Female", proxy_vars)])
}
make_di_removed_data <- function(data, features_to_transform, lambda = 1) {
data_for_di <- data %>%
mutate(
GroupOne.ct = as.numeric(GroupOne),
Bonus.ct = as.numeric(Bonus)
)
# Jitter avoids problems caused by many ties, while preserving min/max observations.
jitter_keep_minmax <- function(x, factor = 1) {
keep <- x == min(x, na.rm = TRUE) | x == max(x, na.rm = TRUE)
x[!keep] <- jitter(x[!keep], factor = factor)
x
}
data_for_di <- data_for_di %>%
mutate(
Age.ct = jitter_keep_minmax(Age.ct, factor = 1),
Bonus.ct = jitter_keep_minmax(Bonus.ct, factor = 1),
Value = jitter_keep_minmax(Value, factor = 1),
Density = jitter_keep_minmax(Density, factor = 1),
GroupOne.ct = jitter_keep_minmax(GroupOne.ct, factor = 5),
Insurancescore = jitter_keep_minmax(Insurancescore, factor = 1)
)
di_data <- disparate_impact_remover(
data = data_for_di,
protected = as.factor(data_for_di$Gender),
features_to_transform = features_to_transform,
lambda = lambda
)
di_data <- di_data %>%
mutate(
Age = make_age_group(Age.ct),
Age = as.factor(as.numeric(Age)),
GroupOne = bin_to_factor(GroupOne.ct, 1, 20),
Bonus = bin_to_factor(Bonus.ct, 1, 21)
)
di_data
}
fit_gamma_model <- function(formula, data) {
glm(formula, family = Gamma(link = "log"), data = data)
}
fit_poisson_model <- function(formula, data) {
glm(formula, family = poisson(link = "log"), data = data, offset = log(Exppdays))
}
adjust_to_base_portfolio <- function(raw_premium, base_premium) {
raw_premium * sum(base_premium) / sum(raw_premium)
}
make_xgb_design <- function(data) {
xgb_unused <- c(
"SubGroup2", "Age", "GroupOne.ct", "Female", "ClaimSeverity",
"Type", "Category", "Occupation", "GroupTwo", "Adind", "Poldur",
"Numtpbi", "Indtpbi", "Bonus.ct", "GP_predictors"
)
data %>%
select(-any_of(xgb_unused)) %>%
model.matrix(~ 0 + ., data = .) %>%
as.data.frame() %>%
select(-any_of("GenderMale"))
}
make_xgb_dmatrix <- function(data, response = NULL, base_margin, drop_vars) {
x <- data %>%
select(-any_of(drop_vars)) %>%
as.matrix()
if (is.null(response)) {
dmat <- xgb.DMatrix(x)
} else {
dmat <- xgb.DMatrix(x, label = data[[response]])
}
setinfo(dmat, "base_margin", log(base_margin))
dmat
}
align_xgb_design <- function(new_data, train_data) {
missing_cols <- setdiff(names(train_data), names(new_data))
if (length(missing_cols) > 0) {
new_data[missing_cols] <- 0
}
new_data[, names(train_data), drop = FALSE]
}The original study also considered the use of an artificial proxy for gender to illustrate how indirect discrimination may persist even when the protected variable itself is removed from the pricing model.
An example of this construction is included in the code above. The resulting proxy is not used in the remainder of this case study, but interested readers may explore its impact by incorporating it into the pricing models and comparing the resulting premiums.
We use the pg15training dataset from the CASdatasets package (Dutang, Charpentier, and Dutang 2020), which was originally released for the first French motor insurance pricing competition organised by the French Institute of Actuaries in 2015. The dataset contains 100,000 third-party liability (TPL) motor insurance policies observed during 2009 and 2010. TPL insurance covers damage caused to third parties when the insured driver is responsible for an accident. In this case study, we focus on third-party material damage claims, which occur more frequently than bodily injury claims in the dataset.
Following the frequency–severity framework commonly used in actuarial pricing, we model claim frequency and claim severity separately and combine them to obtain predicted pure premiums. To illustrate how antidiscrimination pricing models can be implemented under both traditional and machine-learning pricing frameworks, this case study considers both generalized linear models (GLMs) and Extreme Gradient Boosting (XGBoost).
Gender is treated as the protected variable throughout the analysis. The pricing models use the following nonprotected rating variables: Age, Bonus, Group1 (vehicle group), Density (population density), and Value (vehicle value). Following Schelldorfer and Wuthrich (2019), Age is fitted as a continuous function form in the GLMs.
We also construct an insurance score (Insurancescore) for each policyholder. It is the fitted probability from a logistic regression of claim incidence on vehicle type, category, occupation, region (Group2), and age,
I(\text{Indtppd} \neq 0) \sim \text{Type} + \text{Category} + \text{Occupation} + \text{GroupTwo} + \text{Age}.
Type, category, occupation, and region are categorical in the raw data. They are not entered separately in the frequency and severity models. Only the fitted score enters those models as a single continuous predictor.
We follow the baseline scenario in Xin and Huang (2024) and treat Insurancescore as the only non-legitimate rating variable. Conceptually it stands in for a proxy-rich factor, such as a credit-based insurance score, that may correlate with the protected attribute and attract stricter fairness scrutiny even when it retains some predictive value.
There is also a practical reason for this construction. Models 3 and 4 debias predictors with a disparate-impact remover (Feldman et al. 2015) implemented in the fairmodels package. That routine is defined for continuous inputs. Summarising the categorical proxy variables in one continuous score makes conditional debiasing under MCDP feasible while leaving the legitimate rating variables unchanged.
Our response variable is pure premium (expected claim cost), defined as
\text{Pure Premium} = \text{Frequency} \times \text{Severity}.
This is the \hat{Y} target in Xin and Huang (2024). It is the technical price stage: loadings for expense, capital, reinsurance, and profit are out of scope here (see For practitioners above).
In this scenario, all rating variables other than Insurancescore are regarded as legitimate. Legitimate variables are predictors that regulators approve for unrestricted use. Non-legitimate variables may still appear in pricing but are subject to additional fairness constraints or debiasing. This setup allows us to compare different fairness criteria and pricing approaches while keeping the empirical analysis relatively simple.
Finally, the first 21 records are removed prior to modelling because they are duplicate observations with non-zero claim counts but zero claim amounts. After their removal, the dataset contains exactly 50,000 policies from each year (2009 and 2010), consistent with the original study.
# Data preparation
data(pg15training, pg15pricing)
ClaimsData <- prepare_pricing_data(pg15training)
score_out <- add_insurance_score(ClaimsData)
ClaimsData <- score_out$data
CSLogit1 <- score_out$model
# The gender proxy is not used in the current scenario,
# but is retained for potential extensions and exercises.
FemLogit1 <- make_gender_proxy(ClaimsData)
ClaimsData$GP_predictors <- fitted(FemLogit1)
ClaimsData_rd <- ClaimsData %>%
filter(Indtppd > 0)To implement Models 3 (MDP, demographic parity model) and 4 (MCDP, conditional demographic parity model), we construct debiased versions of the explanatory variables using the disparate impact (DI) remover proposed by Feldman et al. (2015).
Model 3 (MDP) applies the DI remover to all selected predictors, aiming to remove their dependence on the protected variable. Model 4 (MCDP) follows the conditional demographic parity framework and applies the DI remover only to the selected non-legitimate variable, which in this case study is Insurancescore.
Before applying the DI remover, a small amount of random noise is added to ordinal variables. This expands the sample space and reduces the large number of tied observations that commonly occur in insurance datasets, while preserving the minimum and maximum observed values. After debiasing, variables that were originally categorical are mapped back to their corresponding factor levels.
The resulting datasets are subsequently used to fit Models 3 and 4.
features_to_transform <- c(
"Age.ct",
"Bonus.ct",
"Value",
"Density",
"GroupOne.ct",
"Insurancescore"
)
ClaimsData_DIremv <- make_di_removed_data(
ClaimsData,
features_to_transform,
lambda = lambda_all
)
# Model 3: apply DI remover to all selected predictors.
ClaimsData_M3 <- ClaimsData
ClaimsData_M3[, c("Age", "Age.ct", "Bonus", "Value", "Density", "GroupOne", "Insurancescore")] <-
ClaimsData_DIremv[, c("Age", "Age.ct", "Bonus", "Value", "Density", "GroupOne", "Insurancescore")]
# Model 4: apply DI remover only to the non-legitimate variable(s).
ClaimsData_M4 <- ClaimsData
m4_features <- intersect(glm_predictors, non_legitimate)
ClaimsData_M4[, m4_features] <- ClaimsData_DIremv[, m4_features]
ClaimsData_M3_rd <- ClaimsData_M3 %>%
filter(Indtppd > 0)
ClaimsData_M4_rd <- ClaimsData_M4 %>%
filter(Indtppd > 0)Following Xin and Huang (2024), we consider five pricing models, including a benchmark pricing model and several antidiscrimination pricing approaches. Each approach is implemented using both GLM and XGBoost. For the GLM implementation, claim severity is modelled using a Gamma GLM with a log link, while claim frequency is modelled using a Poisson GLM with a log link and exposure offset. For the XGBoost implementation, claim frequency and severity are modelled using appropriate boosting objectives for count and positive continuous outcomes. Predicted pure premiums are obtained by combining the corresponding frequency and severity predictions.
The five approaches considered in this case study are referred to as Model 1 to Model 5 for consistency with the implementation code. Their corresponding notation in Xin and Huang (2024) is shown in the table below.
| Case Study Model | Paper Notation | Description |
|---|---|---|
| Model 1 | M0 | Full model including Gender |
| Model 2 | MU | Unawareness model excluding Gender |
| Model 3 | MDP | Demographic parity model fitted using debiased predictors |
| Model 4 | MCDP | Conditional demographic parity model fitted using legitimate and debiased non-legitimate predictors |
| Model 5 | MC | Post-processing approach based on controlling for the protected variable (CPV) |
# Model specifications
# M0: Full model including Gender
# MU: Fairness through unawareness (Gender removed)
# MDP: Fitted using debiased versions of all selected predictors
# MCDP: Fitted using legitimate variables and a debiased
# version of the non-legitimate variable
#
# Models with suffix B additionally include the
# artificial gender proxy predictor.
sev_formula_m1 <- ClaimSeverity ~
Age.ct + log(Age.ct) + I(Age.ct^2) + I(Age.ct^3) +
I(Age.ct^4) + Bonus + Density + Gender + Insurancescore
sev_formula_m2 <- update(sev_formula_m1, . ~ . - Gender)
freq_formula_m1 <- Numtppd ~
Age.ct + log(Age.ct) + I(Age.ct^2) + I(Age.ct^3) +
I(Age.ct^4) + Bonus + Density + GroupOne +
Insurancescore + Gender
freq_formula_m2 <- update(freq_formula_m1, . ~ . - Gender)
# Severity models
SevGamma1 <- fit_gamma_model(sev_formula_m1, ClaimsData_rd)
SevGamma2 <- fit_gamma_model(sev_formula_m2, ClaimsData_rd)
SevGamma3 <- fit_gamma_model(sev_formula_m2, ClaimsData_M3_rd)
SevGamma4 <- fit_gamma_model(sev_formula_m2, ClaimsData_M4_rd)
# Frequency models
FreqPoisson1 <- fit_poisson_model(freq_formula_m1, ClaimsData)
FreqPoisson2 <- fit_poisson_model(freq_formula_m2, ClaimsData)
FreqPoisson3 <- fit_poisson_model(freq_formula_m2, ClaimsData_M3)
FreqPoisson4 <- fit_poisson_model(freq_formula_m2, ClaimsData_M4)xgbData <- make_xgb_design(ClaimsData)
# Model 3 (MDP): follow the original XGBoost implementation by
# using the fully debiased design matrix directly.
xgbData_M3 <- make_xgb_design(ClaimsData_DIremv)
# Model 4 (MCDP): start from the original XGBoost design and replace
# only the non-legitimate variable(s). In Scenario 1 this is Insurancescore.
xgbData_M4 <- xgbData
m4_train_vars <- intersect(names(xgbData_M4), non_legitimate)
xgbData_M4[, m4_train_vars] <- xgbData_M3[, m4_train_vars]
drop_common <- c(
"Exppdays", "Numtppd", "Indtppd",
"W_one", "W_two", "W_three", "W_four", "W_five",
"M_one", "M_two", "M_three", "M_four", "M_five"
)
# Frequency models
xgbData_FM1 <- make_xgb_dmatrix(
xgbData,
response = "Numtppd",
base_margin = xgbData$Exppdays,
drop_vars = drop_common
)
xgbData_FM2 <- make_xgb_dmatrix(
xgbData,
response = "Numtppd",
base_margin = xgbData$Exppdays,
drop_vars = c(drop_common, "GenderFemale")
)
xgbData_FM3 <- make_xgb_dmatrix(
xgbData_M3,
response = "Numtppd",
base_margin = xgbData_M3$Exppdays,
drop_vars = c(drop_common, "GenderFemale")
)
xgbData_FM4 <- make_xgb_dmatrix(
xgbData_M4,
response = "Numtppd",
base_margin = xgbData_M4$Exppdays,
drop_vars = c(drop_common, "GenderFemale")
)
paramsFreq <- list(
objective = "count:poisson",
eval_metric = "poisson-nloglik",
max_depth = 2,
eta = 0.05,
min_child_weight = 3,
subsample = 0.8,
colsample_bytree = 0.8,
tree_method = "hist"
)
set.seed(358)
xgbFreq1 <- xgb.train(xgbData_FM1, nrounds = 3958, params = paramsFreq)
xgbFreq2 <- xgb.train(xgbData_FM2, nrounds = 3988, params = paramsFreq)
xgbFreq3 <- xgb.train(xgbData_FM3, nrounds = 3151, params = paramsFreq)
xgbFreq4 <- xgb.train(xgbData_FM4, nrounds = 3987, params = paramsFreq)
# Severity models
xgbData_sev <- xgbData %>% filter(Indtppd > 0)
xgbData_M3_sev <- xgbData_M3 %>% filter(Indtppd > 0)
xgbData_M4_sev <- xgbData_M4 %>% filter(Indtppd > 0)
xgbData_SM1 <- make_xgb_dmatrix(
xgbData_sev,
response = "Indtppd",
base_margin = xgbData_sev$Numtppd,
drop_vars = drop_common
)
xgbData_SM2 <- make_xgb_dmatrix(
xgbData_sev,
response = "Indtppd",
base_margin = xgbData_sev$Numtppd,
drop_vars = c(drop_common, "GenderFemale")
)
xgbData_SM3 <- make_xgb_dmatrix(
xgbData_M3_sev,
response = "Indtppd",
base_margin = xgbData_M3_sev$Numtppd,
drop_vars = c(drop_common, "GenderFemale")
)
xgbData_SM4 <- make_xgb_dmatrix(
xgbData_M4_sev,
response = "Indtppd",
base_margin = xgbData_M4_sev$Numtppd,
drop_vars = c(drop_common, "GenderFemale")
)
paramsSev <- list(
objective = "reg:gamma",
max_depth = 1,
eta = 0.015,
min_child_weight = 2,
subsample = 0.8,
colsample_bytree = 0.8,
tree_method = "hist"
)
set.seed(946)
xgbSev1 <- xgb.train(xgbData_SM1, nrounds = 2239, params = paramsSev)
xgbSev2 <- xgb.train(xgbData_SM2, nrounds = 2113, params = paramsSev)
xgbSev3 <- xgb.train(xgbData_SM3, nrounds = 2333, params = paramsSev)
xgbSev4 <- xgb.train(xgbData_SM4, nrounds = 1908, params = paramsSev)The following tables summarise the in-sample model fit and the mean predicted pure premiums by gender under both GLM and XGBoost implementations. These summaries provide an initial check of how the different pricing approaches affect the relative premium levels across protected groups.
glm_mean <- glmpred_sum %>%
group_by(Gender) %>%
summarise(
Method = "GLM",
M0 = mean(PurePrem1),
MU = mean(PurePrem2),
MDP = mean(PurePrem3),
MCDP = mean(PurePrem4),
MC = mean(PurePrem5),
.groups = "drop"
)
xgb_mean <- xgbpred_sum %>%
group_by(Gender) %>%
summarise(
Method = "XGBoost",
M0 = mean(xgbPurePrem1),
MU = mean(xgbPurePrem2),
MDP = mean(xgbPurePrem3),
MCDP = mean(xgbPurePrem4),
MC = mean(xgbPurePrem5),
.groups = "drop"
)
mean_premium_table <- bind_rows(glm_mean, xgb_mean) %>%
mutate(
Group = paste(Method, tolower(as.character(Gender)))
) %>%
select(Group, M0, MU, MDP, MCDP, MC)
kable_styling(
knitr::kable(
mean_premium_table,
digits = 2,
caption = "Comparison of Mean Predicted Pure Premiums by Model, Method, and Gender"
),
font_size = 10
)| Group | M0 | MU | MDP | MCDP | MC |
|---|---|---|---|---|---|
| GLM male | 130.47 | 114.03 | 118.00 | 115.25 | 113.95 |
| GLM female | 95.66 | 124.05 | 117.16 | 121.92 | 124.18 |
| XGBoost male | 131.01 | 114.40 | 118.19 | 116.62 | 114.21 |
| XGBoost female | 94.58 | 123.41 | 116.82 | 119.54 | 123.73 |
Following Xin and Huang (2024), we summarise group fairness using the disparate impact ratio,
\text{Disparate Impact Ratio} = \frac{\mathbb{E}(\hat{Y}=\hat{y}|X_{P}=b)}{\mathbb{E}(\hat{Y}=\hat{y}|X_{P}=a)}. This measure compares the average predicted pure premiums between two protected groups. A value close to 1 indicates similar average technical prices across groups (not necessarily filed commercial rates). To approximate the insurance version of the four-fifths rule, we expect this fairness score to lie between 0.8 and 1.25. These thresholds are shown as dotted reference lines in the fairness–accuracy plots.
Predictive accuracy is summarised using the root mean squared error (RMSE),
\text{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^n(y_i-\hat y_i)^2},
and the normalised Gini index,
\text{Normalized Gini Index}=\frac{\frac{\sum_{i=1}^n y_i\,r(\hat y_i)}{\sum_{i=1}^n y_i}-\sum_{i=1}^n \frac{n-i+1}{n}}{\frac{\sum_{i=1}^n y_i\,r(y_i)}{\sum_{i=1}^n y_i}-\sum_{i=1}^n \frac{n-i+1}{n}}. The normalised Gini index evaluates the ranking performance of a pricing model. It uses predicted pure premiums only through their relative rankings, and a larger value indicates better predictive performance.
The following plots compare the fairness–accuracy trade-off across the pricing models under both GLM and XGBoost implementations.
normalised_gini <- function(actual, predicted, weight = NULL) {
if (is.null(weight)) weight <- rep(1, length(actual))
keep <- is.finite(actual) & is.finite(predicted) &
is.finite(weight) & weight > 0
actual <- actual[keep]
predicted <- predicted[keep]
weight <- weight[keep]
weighted_gini <- function(y, score, w) {
ord <- order(score, decreasing = FALSE)
y <- y[ord]
w <- w[ord]
cum_w <- cumsum(w) / sum(w)
cum_y <- cumsum(y * w) / sum(y * w)
lorenz_area <- sum(
(cum_y[-1] + cum_y[-length(cum_y)]) / 2 * diff(cum_w)
)
0.5 - lorenz_area
}
weighted_gini(actual, predicted, weight) /
weighted_gini(actual, actual, weight)
}
glm_model_premiums <- glmpred_sum %>%
transmute(
Method = "GLM",
Gender,
Exppdays,
actual = realclaim,
`M0: Full Model` = PurePrem1,
`MU: Unawareness Model` = PurePrem2,
`MDP: Demographic Parity` = PurePrem3,
`MCDP: Conditional Demographic Parity` = PurePrem4,
`MC: Controlling for the Protected Variable` = PurePrem5
)
xgb_model_premiums <- xgbpred_sum %>%
transmute(
Method = "XGBoost",
Gender,
Exppdays,
actual = realclaim,
`M0: Full Model` = xgbPurePrem1,
`MU: Unawareness Model` = xgbPurePrem2,
`MDP: Demographic Parity` = xgbPurePrem3,
`MCDP: Conditional Demographic Parity` = xgbPurePrem4,
`MC: Controlling for the Protected Variable` = xgbPurePrem5
)
model_premiums <- bind_rows(
glm_model_premiums,
xgb_model_premiums
)
fairness_accuracy_data <- model_premiums %>%
pivot_longer(
cols = -c(Method, Gender, Exppdays, actual),
names_to = "Model",
values_to = "Premium"
) %>%
group_by(Method, Model) %>%
summarise(
male_mean = mean(Premium[Gender == "Male"], na.rm = TRUE),
female_mean = mean(Premium[Gender == "Female"], na.rm = TRUE),
disparate_impact_ratio = male_mean / female_mean,
rmse = sqrt(mean((actual - Premium)^2, na.rm = TRUE)),
normalised_gini = normalised_gini(actual, Premium, weight = Exppdays),
.groups = "drop"
) %>%
mutate(
Model = factor(
Model,
levels = c(
"M0: Full Model",
"MU: Unawareness Model",
"MDP: Demographic Parity",
"MCDP: Conditional Demographic Parity",
"MC: Controlling for the Protected Variable"
)
),
Method = factor(Method, levels = c("GLM", "XGBoost"))
)
fairness_gini_plot <- ggplot(
fairness_accuracy_data,
aes(
x = normalised_gini,
y = disparate_impact_ratio,
colour = Model,
shape = Method
)
) +
geom_hline(yintercept = c(0.8, 1.25), linetype = "dotted") +
geom_hline(yintercept = 1, linetype = "dotdash") +
geom_point(size = 3) +
labs(
x = "Normalized Gini Index (Accuracy)",
y = "Disparate Impact Ratio (Fairness)",
colour = "Model",
shape = "Method"
) +
theme_minimal()
fairness_rmse_plot <- ggplot(
fairness_accuracy_data,
aes(
x = rmse,
y = disparate_impact_ratio,
colour = Model,
shape = Method
)
) +
geom_hline(yintercept = c(0.8, 1.25), linetype = "dotted") +
geom_hline(yintercept = 1, linetype = "dotdash") +
geom_point(size = 3) +
labs(
x = "Root Mean Square Error (Accuracy)",
y = "Disparate Impact Ratio (Fairness)",
colour = "Model",
shape = "Method"
) +
theme_minimal()
ggarrange(
fairness_gini_plot,
fairness_rmse_plot,
common.legend = TRUE,
legend = "right"
)Double lift charts are commonly used in insurance pricing to compare the relative performance of two competing rating plans. Following Goldburd et al. (2016), they can be used to identify where two models disagree most and whether the observed claims experience is closer to one model or the other.
From a fairness perspective, insurers may be concerned that antidiscrimination pricing models could weaken actuarial (individual) fairness and increase adverse selection. To explore this issue, we focus on the GLM implementation and compare a fair model with the benchmark unawareness model (GLM MU), which represents a common industry practice.
In this case study, we compare GLM MDP (Model 3) against GLM MU (Model 2). The construction of a double lift chart consists of three steps:
\text{Pure Premium Ratio} = \frac{ \text{Predicted Premium of Benchmark Model} }{ \text{Predicted Premium of Fair Model} }.
Sort policies according to the pure premium ratio and divide them into groups with approximately equal exposure.
Within each group, calculate the average predicted premium under each model and the average observed claim cost.
The first and last groups correspond to policyholders for whom the two models disagree most. Therefore, double lift charts provide a useful way to investigate how premium redistribution may affect adverse selection and customer movement between competing insurers.
add_equal_weight_group <- function(
table,
sort_by,
expo = NULL,
group_variable_name = "groupe",
nb = 10
) {
sort_by_var <- enquo(sort_by)
if (!missing(expo)) {
expo_var <- enquo(expo)
total <- table %>%
pull(!!expo_var) %>%
sum()
breaks <- seq(0, total, length.out = nb + 1) %>%
head(-1) %>%
c(Inf) %>%
unique()
table %>%
arrange(!!sort_by_var) %>%
mutate(
cum_exposure = cumsum(!!expo_var)
) %>%
mutate(
!!group_variable_name := cut(
cum_exposure,
breaks = breaks,
ordered_result = TRUE,
include.lowest = TRUE
) %>%
as.numeric()
) %>%
select(-cum_exposure)
} else {
total <- nrow(table)
breaks <- seq(0, total, length.out = nb + 1) %>%
head(-1) %>%
c(Inf) %>%
unique()
table %>%
arrange(!!sort_by_var) %>%
mutate(
cum_exposure = row_number()
) %>%
mutate(
!!group_variable_name := cut(
cum_exposure,
breaks = breaks,
ordered_result = TRUE,
include.lowest = TRUE
) %>%
as.numeric()
) %>%
select(-cum_exposure)
}
}
double_lift_chart <- function(
data,
pred1,
pred2,
expo,
obs,
nb = 8,
obs_lab = "Actual Claims",
pred1_lab = "Benchmark Model",
pred2_lab = "Comparison Model"
) {
pred1_var <- enquo(pred1)
pred2_var <- enquo(pred2)
expo_var <- enquo(expo)
obs_var <- enquo(obs)
pred1_name <- quo_name(pred1_var)
pred2_name <- quo_name(pred2_var)
obs_name <- quo_name(obs_var)
chart_data <- data %>%
mutate(
ratio = !!pred1_var / !!pred2_var
) %>%
filter(!!expo_var > 0) %>%
drop_na(
ratio,
!!pred1_var,
!!pred2_var,
!!obs_var,
!!expo_var
) %>%
add_equal_weight_group(
sort_by = ratio,
expo = !!expo_var,
nb = nb
) %>%
group_by(groupe) %>%
summarise(
ratio_mean = mean(ratio),
ratio_min = min(ratio),
ratio_max = max(ratio),
exposure = sum(!!expo_var),
actual = sum(!!obs_var * !!expo_var) / exposure,
pred1_avg = sum(!!pred1_var * !!expo_var) / exposure,
pred2_avg = sum(!!pred2_var * !!expo_var) / exposure,
.groups = "drop"
) %>%
mutate(
label = paste0(
"[",
round(ratio_min, 2),
", ",
round(ratio_max, 2),
"]"
)
)
plot_data <- chart_data %>%
select(
groupe,
ratio_mean,
label,
actual,
pred1_avg,
pred2_avg
) %>%
pivot_longer(
c(actual, pred1_avg, pred2_avg),
names_to = "Series",
values_to = "Value"
) %>%
mutate(
Series = recode(
Series,
actual = obs_lab,
pred1_avg = pred1_lab,
pred2_avg = pred2_lab
)
)
ggplot(
plot_data,
aes(
x = ratio_mean,
y = Value,
colour = Series,
linetype = Series
)
) +
geom_line() +
geom_point() +
scale_x_continuous(
breaks = chart_data$ratio_mean,
labels = chart_data$label
) +
labs(
x = paste0(
"Pure Premium Ratio: ",
pred1_lab,
" / ",
pred2_lab
),
y = "Average Claims or Premiums",
colour = NULL,
linetype = NULL
) +
theme_minimal() +
theme(
axis.text.x = element_text(
angle = 30,
hjust = 1,
vjust = 1
)
)
}double_lift_all <- double_lift_chart(
data = glmpred_sum,
pred1 = PurePrem2,
pred2 = PurePrem3,
expo = Exppdays,
obs = realclaim,
nb = 8,
pred1_lab = "GLM MU",
pred2_lab = "GLM MDP"
)
double_lift_alldouble_lift_male <- double_lift_chart(
data = glmpred_sum %>% filter(Gender == "Male"),
pred1 = PurePrem2,
pred2 = PurePrem3,
expo = Exppdays,
obs = realclaim,
nb = 8,
pred1_lab = "GLM MU",
pred2_lab = "GLM MDP"
)
double_lift_female <- double_lift_chart(
data = glmpred_sum %>% filter(Gender == "Female"),
pred1 = PurePrem2,
pred2 = PurePrem3,
expo = Exppdays,
obs = realclaim,
nb = 8,
pred1_lab = "GLM MU",
pred2_lab = "GLM MDP"
)
ggarrange(
double_lift_male,
double_lift_female,
labels = c("Male", "Female"),
common.legend = TRUE,
legend = "bottom"
)